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Abstract 

Alternative novel measures of the distance between any two partitions 
of a n-set are proposed and compared, together with a main existing one, 
namely partition- distance D(-, ■). The comparison achieves by checking 
their restriction to modular elements of the partition lattice, as well as 
in terms of suitable classifiers. Two of the new measures obtain through 
the size, a function mapping every partition into the number of atoms 
finer than that partition. One of these size-based distances extends to ge- 
ometric lattices the traditional Hamming distance between subsets, when 
these latter are regarded as hypercube vertexes or binary n-vectors. Af- 
ter carefully framing the environment, a main comparison finally results 
from the following bounding problem: for every value k, with < k < n, 
of partition-distance D(-, ■), determine the minimum and maximum of 
the indicator-Hamming distance 8 IH (P, Q) proposed here over all pairs of 
partitions P, Q such that D(P, Q) = k. 

Key words: partition lattice, modular element, distance measure, Ham- 
ming distance, sub- and super-modular partition function, clustering. 
MSC 2010 : 03C13, 03G10, 05A18, 06B15, 06C10, 06D05, 11B73. 



1 Introduction 

Over the last decade, considerable interest has been attracted on measuring 
the distance between partitions (as well as between and/or within collections of 
partitions). The issue arises, in general, when making similarity comparisons 
between clusterings [1 H E M EE CCD HZ] • 

The problem of quantifying the distance between partitions of a finite set is 
here approached with a specific combinatorial target, in that the proposed mea- 
sure aims at keeping into account the coarsening, meet and join relations of the 
partition lattice exactly in the same way as the traditional Hamming distance 
between subsets does with inclusion, intersection and union. Put it differently, 
the objective is reproducing the symmetric difference between subsets when 
measuring distances between partitions. 

Despite the analysis adopts such a focused and somehow theoretical perspec- 
tive, still the outcome is a variety of novel partition distance measures, each pos- 
sibly meeting an alternative application need. In particular, the measure that 
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factually translates the traditional Hamming distance between subsets in terms 
of partitions appears to evaluate differences in a very accurate and granular 
manner. 

Meet, join and order relations of the subset and partition lattices, as well as 
their distinctive features and what renders modular an element in a lattice, are 
described in [Tl 1141 [T5] . In particular, modular elements of the partition lattice 
are extensively dealt with in the sequel. Also, partitions are mostly treated as 
collections of atoms of the partition lattice, and these atoms are modular. More 
generally, the approach leads to work with linear dependence [16] . commonly 
arising in geometric lattices. In a way, the indicator-Hamming distance measure 
proposed below fully exploits such a linear dependence for evaluating differences 
between partitions. 

The next section details two simple ways of translating the Hamming dis- 
tance between subsets in terms of partitions: one is through the symmetric 
difference while the other is through the rank. In section 3 they are compared 
with partition- distance proposed in [8] by checking their behavior over pairs of 
modular partitions. In section 4 these three measures are characterized in terms 
of suitable classifiers (applying to any complemented lattice) . Section 5 focuses 
on atoms of the partition lattice, populating the first level of the Hasse dia- 
gram. The remainder of the paper looks at partitions precisely in terms of their 
representations as a join of atoms. Linear dependence means that the generic 
partition has many such representations. The size of a partition is the number 
of atoms finer than that partition or, equivalently, the cardinality of the largest 
representation of that partition as a join of atoms [H] . It is shown to be a strictly 
monotone and super-modular partition function. Section 6 provides and char- 
acterizes two novel partition distance measures: one is size-based, using the size 
just like the rank-based distance (from section 2) uses the rank, while the other 
is named indicator- Hamming and proposed as the faithful translation of the 
Hamming distance between subsets. In fact, it measures the distance between 
any two partitions by counting the number of atoms finer than either one but 
not both. Section 7 details the features displayed by this IH distance measure by 
bounding its maximum and minimum for every value of partition-distance. Es- 
sentially, apart from providing the sought combinatorial congruence, the former 
distance is very precise and granular at quantifying differences between parti- 
tions: its range is large (much larger than all those of other distances appearing 
here), and this is very useful for measuring distances between partitions from 
the mostly populated levels of the Hasse diagram, where more distinct types 
of differences between partitions actually exist. Final remarks are contained in 
sections 8. 

2 Symmetric difference and rank 

For a finite set N = {1, . . . , n} (or [n]), let (2^, n, U) and (P N , A, V) denote the 
corresponding subset and partition lattices, with inclusion D and coarsening 
as order relations, respectively. Both are atomic, and the fomer is distributive 
while the latter is geometric indecomposable [TJ [T5] . 

The distance between elements of a ordered set is to be measured in terms 
of the order relation. On the other hand, measures of the difference between 
elements of a generic set are commonly referred to as Hamming distances when 
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elements arc firstly represented as arrays, and next the difference between any 
two of them simply reduces to counting the number of entries where their two 
array representations differ. In discrete settings, measuring distances seems to 
naturally reduce to counting. 

The Hamming distance d(A, B) between any two subsets A,Be2 N is 

d(A, B) = \AAB\ = \A\B\ + \B\A\ = r(A U B) - r(A n B), (I) 

r : 2 N — > Z + being the rank function: r(A) — \A\ for all A G 2 N . In words, 
d{-, •) counts how many i G N are included in either A or else B, but not in both. 
Note that such elements i € N are the atoms {i} G 2 N of the subset lattice. 
This is a Hamming distance in that subsets A G 2 N are firstly represented as 
binary vectors through their characteristic function \A '■ N — > {0, 1} defined by 
XA(i) = 1 if i S A and Xa{i) = if i G N\A, and next the distance between 
any two subsets A, B G 2 N is the number of entries where \A and xb differ. 
That is, the cardinality of their symmetric difference AAB. 

Any subset A G 2 N has a unique complement A c = N\A. For all non-empty 
subsets C A C N and all partitions P G V N , denote by P A the partition of 
A induced by P, and let V A be the sub-lattice of partitions of A. Partition- 
distance D : V N X V N ->• {0, 1, . . . , n - 1} given by [S] is 

£>(P,Q) = min{|A c | : C A C N,P A = Q A }. (2) 

That is, the minimum number of elements i G N that must be deleted in 
order for the two residual induced partitions to coincide. Also, D(P, Q) is the 
minimum number of elements that must be moved between [or away from] blocks 
of P so that the resulting partition equals Q (see [51 p. 160]). Although there 
exist Hamming distances between partitions in the literature [3J [TU] , partition- 
distance £>(•, •) is not among them, because in (2) there is no count of non- 
matched entries in some array representations of P and Q. On the other hand, 
there are two immediate ways of paralleling (1) when switching from subsets 
to partitions. One is treating partitions as special collections of subsets, while 
the other is using the rank of the partition lattice just like r(-) appears in (1). 
These two alternatives are now briefly detailed. 

Partitions may well be looked at as subsets of 2^, in that P C 2 N or equiv- 
alently P G 2 2 for all P G V N . Hence, the distance S SD (P,Q) between any 
two partitions P and Q may be measured as the cardinality 

S SD (P,Q) = \PAQ\ = \P\Q\ + \Q\P\ = \PUQ\-\PnQ\ (3) 

of their symmetric difference (SD). That is, the number of distinct A G 2 N 
such that either A G P or else A G Q but not both. This distance counts the 
number of non-matched entries in array representations XPiXQ '■ % N — > {0, 1}, 
with xp(A) = 1 if A G P and otherwise for all A^2 N and similarly for Q. 

Any lattice has a rank function r(-), mapping elements into their level of 
the Hasse diagram. For the partition lattice, r : V N — > Z + is r(P) = n — \P\. 
Given how the rank of subsets appears in (1) above, a further rank-based (RB) 
partition distance measure is 

S RB {P,Q) =r(PVQ) -r(PAQ) = \PaQ\- \PvQ\, (4) 
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where P A Q is the coarsest partition finer than both P, Q and P V Q is the 
finest partition coarser than both P, Q. Note that any block A G P, Q of both 
partitions is also a block of both FAQ and P V Q, and vice versa. 

These simple attempts to parallel (1) already provide two further partition 
distance measures to be compared with partition-distance D(-, •). This is done 
hereafter firstly in terms of the behavior on modular partitions, and secondly in 
terms of some suitable classifiers. 

2.1 Distances between modular partitions 

Modular elements and modular pairs (of elements) are very important for com- 
prehending geometric lattices [TJ[T4l[T5], making it useful to observe the behavior 
of distance measures over pairs of modular partitions (not to be confused with 
modular pairs of partitions). 

The bottom and top elements of partition lattice (V N , A, V) are, respectively, 
P± = {{1}, . . . , {n}} and P T = {N}. Both are modular elements of the lattice. 
The collection of all modular partitions is 

V£ od ={{A}UPf:QcACN}, 

with P % = for all P G V N , and where {A} U Pf is the partition with all 
i G A in a common block and every j G A c in a 1-cardinal block. Note that all 
the n atoms of 2 N (that is, all elements i G N) collapse into a unique modular 
partition, which is the bottom one P±. Hence, \V^ od \ — 2™ — n. 

When restricted to V^ od X V^ od , partition-distance D(-,-) above behaves as 
follows: D(P T , PJ =n-l, while for C A C N 

D({A}UPf,P x ) = \A\-1, 
D({A}UPf.P T ) = |^ c |=n-|A|, 
D({A}UPf,{A c }UPf) = |A|-l + |yl c |-l = n-2. 

In general, for C A, B C N and A ^ B ^ A c , 

D{{A} U Pf , {B} U Pf) = n-\AnB\-\(AU B) c \. 

This obtains by firstly determining a largest subset A' G 2 N where P and Q 
induce the same partition P A = Q A , and next counting the cardinality of its 
complement. In (P T , P±) the sought largest subset is any A' such that \A'\ = 1 
(any atom of 2^). In ({A} U Pf\ P±) it is any A' = A c U i for some i G A. In 
({A}UPf, P T ) it is A' = A. In {{A}UPf, {A e }UPf ) it is any A' = {i,j} such 
that i G A, j G A c . Finally, for the general case ({A} U Pf,{B} U Pf "), these 
two modular partitions are seen to coincide when restricted to largest subset 
A' = (A n B) U (A c n P c ) = (A n B) U (A U Bf ^ for all A,B e2 N ,B^ A c . 
It may be noted that D({A} U Pj 4 ', {B} U Pf c ) = d(A, B) as given by (1). 

The restriction of distance 5 SD above to pairs of modular partitions is 
S SD (P T , P x ) = n+l while G A G N yields 

£ SD ({^}UPf ,P X ) = |A| + 1, 
S SD ({A}UPf,P T ) = n-|A| + 2, 
5 SD ({A}UPf ,{A c }UPf) = 7i + 2. 
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In general, % C A,B C N and A ^ B ^ A c yield 



5 SD {{A}UPf,{B}uPf) = \A C \ + 1 + \B C \ + 1 - 2\A C n B c \ 

= 2(n + l)-(\A\ + \B\)-2\(ADBn 

Concerning 5 RB , firstly consider that for A,B e 2 W the meet and join of the 
two corresponding modular partitions are 

{A} U Pf A {B} U Pf c = {A n B} U p{ AnB ) c , 

with possibly A n B = 0, and 

iai u p^ c v m u p fiC - J {A u B} u P ^ US)C if B n A * ' 

Accordingly, the restriction of distance S RB above to pairs of modular partitions 
is 5 RB {P T ,P ± ) = n-l = D(P T ,P ± ) while 9 c Ac N yields 

5 RB ({A}UPf,P x ) = \A\-l = D({A}UPf ,P X ), 
<^ s ({A}UPf\P T ) = n- \A\ = D({A}UPf,P T ), 
5 RB ({A}UPf,{A c }UPf) = n-2 = D({A}UPf,{A c }UPf). 

In general, 9 C A, B C N and A ^ B ^ A c yield 
! ({A}UPf°,{B}UP 



«B/r/.i, lD A' /dimpB^J |A| + |B|-2if An P = 0, while 



|Aup| - |Ans| if Anp ^ 0. 

Despite the common range, S RB (-, •) and D(-, •) do not coincide even when 
restricted to modular partitions (see case A n B ^ above) . Great differences 
may be checked to arise over pairs of partitions P, Q where one covers the 
other, denoted P >* Q, meaning P > Q and there is no P' G V N such that 
P > P' > Q. For subsets, A D* B when A = B U t for some i G P c . 

Perhaps these behaviors enable to figure the functioning of the three dis- 
tance measures, but still the number [P \Pmod\ — B n ~ 2 n + n of non-modular 
partitions is huge for relevant n, where B n is the (n-th Bell) number of parti- 
tions of a n-set [5J . Accordingly, some general tools for comparison are now 
provided. 



3 Partition distance measures: classifiers 

Complementation [TJ [TH [T5] in the partition and subset lattices acts in very 
different manners: while every subset has a unique complement (see above), 
every partition P C V N has at least one complement (n > 1), but non- modular 
ones have many. They are all those P' G V N such that P A P' = P± as well as 
P VP' = P T . For every partition P G V N , let Pp c contain all its complements. 
A partition distance measure 8 : V N x V N — > Z + should satisfy 

• S(P, Q) = & P = Q for all P,Q eV N {antisymmetry), 

while further conditions may be the following: 
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1. max S(P,Q) = f(n) (f -maximality) , 
p,Qev N 

• f(n+ 1) > f(n) (strong / -monotonicity) , 

• f(n + 2) + /(n) > 2/(n + 1) for all n G N (strong f -convexity), 

2. max S(P,Q) = max S(P,Q) (mod-maximality) , 
p,Q£V N p t Q & -p« od 

3. max 5(P,Q) = 6(P±,P T ) (_LT -maximality) , 

P,QEV N 

4. max <5(P, Q) = 5(P, Q) for all P G P^, Q G (co-maximality) , 

5. <5(P A Q, P V Q) > S{P, Q) for all P,Q eP N (super-modularity), 

6. 5(P A Q, P V Q) < 6(P, Q) for all P,Q eP N [sub-modularity), 

7. S(P A Q, P V Q) = 6(P, Q) for all P,Q eP N (modularity). 

The preliminary statement is obvious: there is no distance between any 
partition and itself as well as, conversely, if there is no distance between two 
partitions then they coincide (see also [H def. 3]). 

The first condition states that the maximum distance between two partitions 
of a n-set is a function / : N — > Z + of n only. Then, antisymmetry entails 
f(l) = 0, as there is a unique partition of a singleton. In addition, the first 
f(n + 1) — f(n) and second f(n + 2) — f(n + 1) — (f(n + 1) — f(n)) differences 
may be both strictly positive. 

Conditions 2-4 all select a region of the product lattice V N x V N where 
the measure has to surely attain its maximum, without excluding that such a 
maximum may be also attained elsewhere. Specifically, condition 2 states that 
the maximum distance between any two partitions of a n-set is the same as that 
observed as the maximum distance between any two modular partitions of the 
set. Condition 3 requires, in addition, that the pair consisting of the bottom and 
top partitions is among the maximizers of the distance. Condition 4 requires, 
in addition, that any pair consisting of a partition and one of its complements 
is among the maximizers of the distance. Hence, each entails the preceding one: 
4^3^2. 

A main observation for discussing conditions 5-7 is that partition distance 
measures have to act on pairs P, Q that are incomparable in terms of coarsening 
^, that is P ^ Q ^ P (hence they are excluded from the incidence algebra of 
the partition lattice [U [15]). In this case, it may be important to know if 
a distance measure behaves differently depending on whether the two involved 
partitions are comparable or not. More precisely, the issue is comparing distance 
<5(P, Q) with the most similar distance between partitions that are comparable, 
namely S(P A Q, P V Q). In the Hasse diagram, the left-right distance between 
incomparable partitions P, Q is replaced with the up-down distance between 
PVQ, PAQ. In this view, a sub-(super-)modular distance measure translates the 
idea that by switching from an incomparable pair to the nearest comparable one 
the distance decreases (increases). More simply, a distance measure is modular 
when it deals with both comparable and incomparable pairs exactly in the same 
manner, being a maximal sub-modular and minimal super-modular one. 
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Classifiers 5-7 borrow their names from lattice functions h : X — > M., 
taking real values on a lattice (X , A,V) with meet A, join V (and, possibly, 
built upon some finite set N as above). Such functions are sub-modular when 
h(x V y) + h(x Ay) < h(x) + h(y) for all pairs x, y £ X N of lattice elements, 
and are key tools in combinatorial theory and optimization [1] [5j [3 115j . Super- 
modularity obtains when the inequality is reversed. Lattice functions satisfying 
both sub- and super-modularity are mostly referred to as modular (or additive 
or valuations). The literature may be found generally concerned more with 
modular set functions rather than modular partition functions; the reason is 
simple: the only way a function can be a modular in the partition lattice is by 
assigning the same constant value to every partition [TJ exercise 12 (ii), p. 195]. 
It must be stressed though, that these names borrowed from lattice functions 
are here applied, instead, to distance measures. These latter map pairs of lattice 
elements, while a function maps lattice elements. Hence, a modular partition 
distance measure is reasonable (as long as it is not built upon a modular partition 
function, see below). 

3.1 Characterization 

Conditions 1-7 apply to any complemented lattice, and thus straightforwardly 
allow to classify the Hamming distance d(-, ■) : 2 N x 2 N — > {0, 1, . . . , n} between 
subsets in (1) above: d{-, •) simply satisfies all conditions apart from strong /- 
convexity, as f(n) = n. In this view, RB partition distance measure S RB (-, •) 
defined by (4) above behaves exactly the same as d(-,-), satisfying all condi- 
tions apart from strong /-convexity, with f(n) = n — 1. Conversely, partition- 
distance D(-,-) and SD distance 5 SD (-,-) (from (2) and (3) above) only satisfy 
certain conditions out of 1-7, and appear substantially different from 5 RB (-, ■) 
(apart from the immediate check that P(-, •) satisfies /-maximality and strong 
/-monotonicity, but not strong /-convexity, as f(n) = n — 1, like S RB (-, ■)). 

Claim 1 Partition- distance P/(-, •) given by (2) is super-modular: 

D(P V Q, P A Q) - D{P, Q) > for all P,Q£T N . 

Proof: Partition-distance D(P, Q) is n — \A\ where A £ 2 N is a largest subset 
satisfying P A = Q A : while partition-distance D(P V Q, P A Q) is n — \B\ where 
B is a largest subset satisfying (P V Q) B — (PA Q) B . What remains to note is 
(P V Q) A ^ P A ^ (P A Q) A < Q A < (P V Q) A for all A £ 2 N . This means that 
for every A £ 2 N , if (P V Q) A = (P A Q) A , then P A = Q A . m 

For example, let N = {1,2,3,4} and consider partitions P,Q £ V N with 
P = {12|34} and Q = {13|24}, where | separates blocks. Then, P V Q = P T 
and P AQ = P ± , and thus P(1234, 1|2|3|4) = 3 > 2 = P(12|34, 13|24). 

Claim 2 Distance measure given by (3) is super-modular: 

6 SD (PVQ,PAQ)- S SD (P, Q) > for all P,Q£P N . 

Proof: As S SD (P,Q) counts the number of blocks of either P or Q but not 
both, it must be shown that the way such blocks are further partitioned in 
P A Q and merged in P V Q yields an overall number of blocks no smaller 
than S SD (P,Q). In fact, this is evident when considering that the partition 
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lattice is the polygon matroid [TJ theorem 6.23, p. 274], and any matroid has a 
sub-modular rank function [TJ rank axioms 6.14, p. 265] (see above). That is, 
r(P V Q) + r(P AQ)< r(P) + r(Q) for all P, Q G V N . Then, 

n- \PVQ\ +n- \PAQ\ < n-\P\ + n-\Q\, 

\PWQ\ + \PAQ\ > \P\ + \Q\, 

\(PWQ)\(PAQ)\ + \(PAQ)\(PWQ)\ > \P\Q\ + \Q\P\, 

5 SD (P V Q,P AQ) > <5 SD (P,Q), 



N 



as |(PvQ)n(PAQ)| = |PnQ|. ■ 

For example, let N = {1,2,3,4,5,6,7} and consider partitions P,Q G V 
with P = {12|34|567} and Q = {12|35|467}. Then, P V Q = {12 34567} 
and P A Q = {12|3|4|5|67}, and thus 5 S - D (12|34567, 12|3|4|5|67) = 5 while 
<5 SD (12|34|567, 12|35|467) = 4. On the other hand, P' = {12|34|56|7} and 
Q' = {12|3|45|67} yield P' V Q' = {12|34567} and P' AQ' — {12|3|4|5|6|7}, and 
thus S SD (P' V Q', P' A Q') = 6 = 5 SD {P', Q'). 

Claim 3 Neither D{-,-) nor 5 SD (-,-) satisfy co-maximality. 

Proof: Concerning £>(•, •), the proof consists in providing a pair of complements 
P, Q between which partition-distance D(P, Q) is strictly less than the maximum 
n — 1. To this end, let n odd and sufficiently large. Consider P = {A, B, {i}} 

and Q = {{i,j,f} U p[ 4JJ ' }c } with \A\ = \B\ = ^ as well as j G A, j' G P. 
Then, P A Q = Pj_ as well as P V Q = P T , and yet D(P, Q) = n - 3, in that 
both P and Q induce the same partition of any 3-cardinal subset of the form 
{i, I, I'} such that I G A\j, V G B\f. 

Concerning 5 SD (-, •), a stronger result is actually obtained, namely that this 
measure does not even satisfy ±T-maximality. To see this, again let n odd and 
sufficiently large; in particular, 2±1 g N. Let P = P A UPf" and Q = Q B (JPf c 
with |A n B\ = 1 as well as \P*\ = 2±1 = |Q B |. In words, both P,Q have 
only 2- and 1-cardinal blocks, and the same numbers and of blocks 
for each of these two cardinalities, respectively. In addition, only one element 
i G N (of the set being partitioned) is included in some 2-cardinal block both 
in P and in Q, that is {i} — An B (while all other elements j G N\i are in 
a 2-cardinal block of P and in a 1-cardinal block of Q, or vice versa). Then, 
S SD (PQ) = 2^ + 2^ = 2^=1 >n + l = «5 SD (P T ,P ± ). ■ 

Claim 4 For a// P, Q G ifP^Q, then 

6 SD (P,Q) - 5 RB (P,Q) + 2|P\Q| = 2\Q\P\ - S RB (P,Q). (5) 

Proof: If P ^ Q, then S RB (P, Q) = r(P V Q) - r(P AQ) = 

= n-\P\Q\-\PnQ\-(n-\Q\P\-\PnQ\) 

= \Q\P\ - \P\Q\ = 5 SD (P, Q) - 2\P\Q\ = 2\Q\P\ - 5 SD (P, Q) 

as wanted. ■ 

Claim 5 For all P, Q G P w , 

<5 SZ5 (P, Q) = 2(n - |P n Q\) - (r(P) + r(Q)). 
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Proof: Simply by substitution: 



S SD (P,Q) = \P\Q\ + \Q\P\ 

= \P\Q\ + \PnQ\ + \Q\P\ + \PnQ\-2\PDQ\ 
= n-r(P) +n-r(Q) -2\PHQ\ 

as wanted. ■ 

It seems important recognizing how the meet A and join V operators of 
the partition lattice are used in different manners by the RB and SD distance 
measures. Both perform a count based on the blocks of either one but not both 
the involved partitions P, Q. These are precisely the blocks disjoined by A and 
adjoined by V. Yet, RB distance counts the number of blocks resulting from 
the join and subtracts it from the number of blocks resulting from the meet. 
Of course, blocks of both the meet and the join vanish through the subtraction. 
Conversely, SD counts the whole number of blocks of either one but not both 
partitions P, Q. Hence, when these latter are comparable in terms of coarsening, 
say P ^ Q, condition (5) is plain. 

Although the RB distance behaves exactly the same as the Hamming dis- 
tance between subsets according to classifiers 1-7 above, still the former docs 
not seem to properly translate the latter in terms of partitions. In particular, as 
both £>(•, •), 5 SD (-, •) are super-modular and do not satisfy co-maximality, these 
latter two measures are actually preferable over 5 RB (-, ■). The reason for this, 
roughly speaking, is that the subset and partition lattices are very different, and 
RB distance simply ignores such differences. 

Focus on super-modularity first. With their two Hasse diagrams in mind, 
consider that there are B n — 2™ more partitions than subsets of a ro-set, and such 
a gap grows dramatically fast as n increases. Yet, partitions are compressed 
into n levels, one less than subsets. There are (?) distinct fc-subsets of a n-set, 

< k < n, while there are = Eo^nK^ 1 )^™!™) TT distinct ways to 
partition a n-set into k blocks, < k < n, where S n ,k are the Stirling numbers 
of the second kind [BJ p. 265] or cardinalities of levels n — fc, < k < n of the 
partition lattice. 

While moving down-upward in the Hasse diagram, in both lattices the cardi- 
nality of levels firstly increases, reaching a maximum, and then decreases. Yet, 
in the subset lattice such a maximum is always reached at levels {[§ J, [§]} 
whenever they differ (and at level | £ N otherwise), and the preceding ascent 
is exactly the same as the following descent. No such a regular behavior is 
displayed by partitions, as the upper part of the Hasse diagram is much more 
populated than the lower one. In fact, the maximum density attains quite above 
the half level, making the preceding ascent slow and the following descent fast 
dJ pp. 91-92], 0. 

All this leads to conclude that when up-down distances between partitions 
are replaced with left-right ones (see above on sub/super-modularity), a kind of 
quantitative expansion occurs with respect to the subset lattice, in that there are 
many more pairs of incomparable partitions than pairs of incomparable subsets, 
simply because most level sets are massively more populated in the partition 
lattice rather than in the subset one. Given such an expansion, any distance 
measure such as RB given in (4), that compares P, Q by taking into account, in 
some fashion, the whole segment (or sub-lattice) [P A Q,P V Q], becomes next 
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forced to also take into account, in the same fashion, all the differences between 
partitions into such a segment. Conversely, SD and partition-distance are not 
under such a forcing, and thus can adapt their behavior to a proper subset of 
the segment. 

As for complementarity, it is crucial noting (again) that non-modular par- 
titions have many complements, and these latter differ in terms of both the 
number and the cardinalities of blocks 14 . Accordingly, asking a distance mea- 
sure to attain its maximum on every pair of complements is reasonable in the 
subset lattice but becomes far too binding when dealing with partitions. This 
is the second reason why S RB (-, ■) is less desirable than S SD (-, ■),£)(•,■). 

Finally, among these latter two, SD distance is better because it takes much 
more values than partition-distance. More precisely, as partitions may differ 
in a number of distinct ways that greatly exceeds n — 2 = . . . ,n — 1}|, 
there are many differences between partitions which are substantially diverse 
while still being mapped by D(-,-) into a same integer between 1 and n — 1. 
Conversely, SD distance is able to recognize that such differences are diverse, 
and thus maps them into distinct (integer-valued) distances. In this view, an 
even better solution to the problem of quantitatively discriminating between 
differences that are factually diverse is proposed in the sequel. Still, a super- 
modular distance measure S RB : V N x V N — \ Z + not satisfying co-maximality 
may be constructed even by resorting simply to the rank: 

S RB (P,Q) = r(P) + r(Q) - 2r(P A Q) (6) 
= 5 RB (P,Q) + (r(P)+r(Q)~(r(P\/Q)+r(PAQ))). (7) 

This distance is super-modular precisely because the rank is a sub-modular 
partition function, and coincides with S RB (P, Q) if and only if P, Q is a modular 
pair [H], that is, if and only if r(P VQ) + r(P A Q) = r{P) + r(Q). It is also 
easily checked that 5 RB (-, •) does not satisfy co-maximality. 

Elementary though it is, one important observation is now the following: the 
rank is a monotone lattice function through which RB distance quantifies differ- 
ences between lattice elements. This may be generalized: once endowed with a 
monotone lattice function h on X N , that is h(x) > h(y) for all x, y £ X N , x y, 
differences between elements x, y £ X N can be promptly quantified by distance 
5(x,y) — h(x Vy) — h(x Ay), which is evidently modular by construction. Then, 
RB measure uses the rank, but any other monotone partition function works. 
An alternative one is hereafter. 

4 Atoms and the size 

Apart from the bottom and top, among the remaining 2™ — n — 2 elements 
of P^ od (let n > 2) there are (£) modular partitions playing a crucial role in 
what follows. They are the atoms of the partition lattice, consisting each of 
n — 1 blocks, one being 2-cardinal and all remaining ones being 1-cardinal. For 
1 < i < j < n>t denote by [ij] = U p^^'A the atom whose unique 2- 

cardinal block is with Vy — {[ij] : 1 < i < j < n} C V^ od containing all 

(2) such atomfl 

-'^Note that n = 1 yields V? = 0, while n = 2 yields = {P T } as well as n = 3 yields 
■pN = -pN Up pTl A1 V N V N f < 3 

1 mod \ L -L ' J > mod — 
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The focus now turns on representing partitions P as strings Ip G {0, 1}W. 
For every partition P G V N , consider the array representation or indicator 
function I P : {0, 1} defined by I P ([ij]) = 1 if P ^ [i?] and I P ([ij]) = if 

P j£ [zj] . This is clearly the analog of the characteristic function \A for subsets 
A G 2 W . Yet, a fundamental distinction must be immediately emphasized: 
while x is a bijection, in that {xa '■ A G 2 W } = {0, 1}", the partition indicator 
function does not reach every vertex of the (™) -dimensional unit hypercube, 

as {/p : P G V N } C {0, 1}W. This redundancy is due to linear dependence, 
characterizing geometric lattices in general [TJ [TS] [TB] . 

The partition indicator function / : V N — > {0, 1}( 2 ) , with I{P) — Ip, enables 
to introduce the size s : V N — > Z+, firstly appearing in 12 as the analog (in a 
sense made clearer shortly) of the cardinality of subsets. The size s(P) = s p is 
the number of atoms finer than P, that is, 

s P =\{[ij}€V? :P>[ij]}\= E ^(feD- 

The size maps partitions of a n-set into the first (™) + 1 positive integers, 
but many of these latter are left out. That is, there are naturals s < (™) , such 
that s ^ s p for all P G V N . The available sizes for partitions of a n-set, n < 7, 
are as follows: 

\N\ = n -> {s p : P G V N } (available sizes) 

1 -> {0} 

2 -> {0,1} 

3 -»• {0,1,3} 

4 {0,1,2,3,6} 

5 -)• {0,1,2,3,4,6,10} 

6 -)• {0,1,2,3,4,6,7,10,15} 

7 -> {0,1,2,3,4,5,6,7,9,10,11,15,21}. 

On the enumerative side, the size obtains from the class c : V N — > Z" , where 
c(P) = c p = (cf , . . . c£) with = |{^4 g P : |A| = fc}| counting the number of 
fc-cardinal blocks of P, for 1 < k < n. Then, 

l<fe<n v 7 AGP v 

Claim 6 TTie size is a strictly monotone partition function: 

s p > s Q for all P,Q eV N such that P> Q. 

Proof: If P > Q, then at least one block A G P is the union of some blocks 
Pi, ... , B m G Q, m > 2. Merging any two such B, B' increases the size by 



which is strictly positive as blocks are non-empty. 



\B\\B'\ 
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Claim 7 The size is a super-modular partition function: 

S PVQ + fi PAQ > g P + S Q f()r aU PiQep N^ 

Proof: If the two partitions are comparable, say P ^ Q, then P = P V Q 
and Q = P A Q, which makes the statement satisfied with equality. Otherwise, 
P ^ Q ^ P entails that there are two maximal chains of partitions, one of which 
meets P A Q and P as well as P V Q, while the other meets P A Q and Q as 
well as PV Q. Focusing on the relevant part or segment^ of the former maximal 
chain, there are P? >* ■ ■ ■ >* A >* P a , with f = r(P V Q) - r{P A Q), such 
that Pq = P AQ and P? = P V Q as well as Pk P = P for some kp, < kp < f. 
Similarly, focusing on the relevant segment of the latter maximal chaird, there 
are Qf >* ■ ■ ■ >* Qi >* Qo such that Q = P A Q and Q? = P V Q as well as 
Qk Q = Q for some fcq, < kg < f. Note that if r(P) = r(Q), then kp = kQ. 

The count s Pv< ^ + s PA< 2 — (s p + s^) may be performed by focusing on each 
level of the two segments. The fact is that most atoms finer than P V Q are 
^-incomparable with respect to both P and Q. Atoms [ij] ^ P A Q may be 
ignored because they are counted in the size of all the four involved partitions 
P,Q,PAQ,P\/Q. As for the remaining ones, observe that 

{[ij] €V?:P> [ij] £ P A Q) p| {[ij] eP?:Q^ [ij] & P A q] = 0. 

To see this, assume an atom [ij] ^ P A Q satisfies P ^ [ij] ^ Q. Then, 
(P A Q) V [ij], and not P AQ, would be the coarsest partition finer than both 
P, Q. In particular, (P A Q) > [ij] => {{P AQ) V [ij]) >* (-P A Q). 

Consider going from PAQ to PWQ through the Hasse diagram twice, initially 
endowed with all atoms finer than PVQ apart from those also finer than P AQ. 
The first route is through segment Pq, . . . , P,~ of the former maximal chain, with 
the following constraint: at each partition reached up to Pk P = P inclusive, all 
atoms finer than the current partition but not also finer than the preceding one 
must be left there in order to proceed. The second route starts with only the 
residual atoms and is through segment Qq, . . . , Qf of the latter maximal chain. 
Again, up to Qk Q = Q inclusive at each reached level all atoms finer than the 
current partition but not also finer than the preceding one must be left there in 
order to proceed. Given the above empty intersection, it is not possible that an 
atom is needed twice for proceeding, and at the end of the second route there 
still remains a non-empty (and large, in general) collection of atoms, namely all 
those for reaching P V Q from either P or Q. ■ 

From a final perspective, consider that any subset has a unique represen- 
tation as a join of atoms i £ N of the subset lattice, while linear dependence 
makes partitions have, in general, many representations as a join of atoms. Most 
of them are redundant, in that removing some atom(s) from the join leaves the 
represented partition unchanged. In fact, any partition has a unique maximal 
or largest representation as a join of atoms. The size counts precisely the car- 
dinality of this largest representation. 

2 A chain, possibly maximal, is a totally ordered sub-lattice, and thus has segments. 
3 The length r is the same for the two segments. 
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5 The indicator-Hamming distance measure 

The size enables to introduce two novel partition distances. For reasons imme- 
diately explained hereafter, they may be referred to as follows: 

• the indicator- Hamming distance S IH : V x V N — > Z + defined by 

s ih (p,q)= J2 (ip(m-iQ(m) 2 = s p + s Q -2 S p ^, (8) 

• the size- based distance S SB : V N x V N Z + defined by 

6 SB (PQ) = £ (lrv Q m)-IPA Q m))=s p vQ-s p ^. (9) 

Just like the Hamming distance between subsets uses their symmetric difference 
as the (counting) measure, the IH distance simply counts the number of non- 
matched entries Ip{[ij\) ^ Iq($j\) f° r 1 < i < j < n between the two array rep- 
resentations Ip, Iq of any two partitions P, Q (as Ip{[ij])—lQ{[ij]) <E { — 1, 0, 1}). 
This means counting the number of atoms finer than either one of the two parti- 
tions but not both, which is exactly what the Hamming distance between subsets 
does in (1) above. Accordingly, this IH measure is here conceived as the faithful 
reproduction of the (cardinality of the) symmetric difference between subsets. 
In terms of the above classifiers 1-7, its behavior will shortly appear rather dif- 
ferent when compared to the Hamming distance between subsets. In fact, as 
explained above, the partition and subset lattices display great differences. 

Much more roughly, SB distance counts the number or atoms finer than the 
meet PAQ and subtracts it from the number of atoms finer that the join PVQ. 
It is immediate noting that the two measures SB and IH coincide on pairs of 
comparable partitions: if (say) P > Q, then PVQ = P, Q = PAQ. More 
generally, these two distances coincide on all and only those pairs P,Q<E P N 
(possibly P ^ Q J£ P) where the size function satisfies s Pv( ^ + s PA< 2 = s p + s® . 
It may be checked that this attains only on modular pairs |14j . that is, 

r(P) + r(Q) - r(P V Q) + r(P A Q) & s p + s Q = s PvQ + s PAQ 

for all P,Qg V N . In this respect, IH distance transforms SB distance similarly 
to how <5+ B (-, •) transforms S RB (-, •) (see (4), (6) and (7) above). 

It is mostly important observing that a main distinction between the IH and 
SB distance measures relies in their ranges (or images [Up. 5]), that do not 
coincide, being one a proper subset of the other. The range of the size-based 
distance contains only certain positive differences between some available sizes 
of partitions (see abov^j]). In addition to these values, attained all the same on 
modular pairs P, Q, IH distance has a variety of further positive integers in its 
range. This is evident from super-modularity of the size function, and provides 
the needed granularity and local flexibility when quantifying differences between 
incomparable partitions. 

4 The number of available sizes for partitions of a n-set exceeds n for n > 3; in fact, as soon 
as n > 3 non-modular elements start appearing. 
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Both measures satisfy /-maximality with f(n) = (2) , and hence both strong 

/-monotonicity and strong /-convexity hold, in that ("J 1 ) — (2) = § + 1 as well 

as(«f) + (™)-2(™ 2 -;)=l. 

Both measures satisfy i_T-maximality, and thus mod-maximality, but the 

SB one also satisfies co-maximality, while the IH one does not, In fact, in (8) 
the join P V Q of the two partitions does not even compare. 

By construction, the SB measure is modular, while the IH one is super- 
modular, precisely because the size is a super-modular partition function: 

5 IH (PVQ,PAQ)-5 IH (P,Q) = s Pv Q + s PA Q - 2s PA Q + 

- (s p + s Q -2s p *Q) 
= s pvQ + s paq _ ( 8 p + S Q) > o 

from above. In fact, SB distance is the minimal modular distance no smaller 
than IH distance over all pairs of partitions. 

SB distance restricted to V^ od x V^ od is S SB {P T ,Pj_) = (™), while for 

Oc/lcJV 

6 8 *{{A}UPf,P ± ) = (W), 
6 SB ({A}uPf, P T) = Q-Cf), 
6 SB {{A}UPf,{A°}UP*) = ( l 2 l ) + ( n_ 2 1 ^)- 
Case C A, B C N and A ^ B ^ A c yields 

which reduces to ') + ( ! f ! ) whenever A n B = (as (°) = Q = 0). 

IH distance restricted to V* od x V% od is S IH (P T ,P ± ) = (£), while for 
c Ac TV 

S IH ({A}UPf ,P ± ) = 

5 IH ({A}uPf,P T ) = 

5 IH ({A}uPf,{A c }uPf) = 

Case C A, B C N and A ^ B ^ A c yields 

6 IH ({A}UPf,{B}uPf)=(\f 

Even when restricted to the 2" — n modular partitions, these two distance 
measures still display different behavior in most cases of incomparability. 
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6 A comparison through bounding 



This section compares partition-distance D(-, •) and indicator-Hamming dis- 
tance S IH (-,-) with the intent to figure how many different values the latter 
may take for every (non-trivial) value of the former. In fact, for k such that 
0<fc<n(fc = 0is indeed trivial), the concern is with the maximum and min- 
imum value taken by S IH (-, •) while ranging over all pairs P, Q G V N satisfying 
D(P, Q) = k. To this end, the following result is important in that it shows 
that looking at a largest subset A C N where any two partitions P,Q G V N 
coincide is equivalent to looking at the largest collection of atoms that are finer 
than both. 

Claim 8 If A G 2 N is a maximal subset where P A = Q A , then s pA = s PA( 2. 

Note that P A G V A while P A Q G V N , but still the size of any partition is a 
positive integer, and thus the sizes of two partitions are comparable even when 
these latter are elements of distinct lattices. In fact, V A is equivalent to segment 
[P±, {A} U Pf] G V N (see above). 

Proof: If A = N, then P = Q and there is nothing to show. Assume A C N. 
Then, P > P A U Pf as well as Q > P A U Pf , entailing that both P and Q 
obtain by joining P A U P A with atoms [ij] ^ P A U P A as follows 

P = (P A UPf)V[ij] 1 V---V[ij] mp , 

Q = (P A uPf)y[ij]' 1 v-..v[ij}' mQ . 

These collections . . . , [ij] mp } = M P , . . . , [ij]' mQ } = M Q C V? 

need not be unique, in general, but both P and Q display each a unique maximal 
collection [ij} mp } = Mp, {[ij}[, . . . , [ij]' m * Q } = of atoms satisfying 

these two equalities. Clearly, [ij] ^ {AjDPf for all [ij] G MpUMg, and these 
two maximal collections have empty intersection, Mp n Mq = 0, in that if there 
was any [ij] included in both, then in partition [ij] V P A U Pf >* P A U P A " 
there would be some A' D A such that P A = Q A , and hence A could not be a 
maximal subset where P and Q coincide. Finally, as P A Q = V [ij], the 

sought conclusion 

{[ij] e V? : [ij] ^P A UPf} = {[ij] G P» : [ij] < P A q} 
follows. ■ 

Thus, s p - s pA = s p - s PA< 5 as well as s® - s pA = «« - s PA< 3, and 
S IH (P, Q) = s p + s Q - 2s PAQ = mp + m* Q , where m* P , m*Q are as above: 

?=(p- 4 UPf)v[ l j] 1 V...V[y] m . ,s p = S p ^ + m^, 

Q=(P A UPf)v[ij]' 1 V...V[ij]' mh S Q = s p ^ + m* Q . 

The issue is now constructing maximal collections M p ,Mq for maximizing or 
else minimizing S IH (P,Q) = [Mp\ + \Mq\, while obeying the following. 
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Claim 9 If A G 2 N , A ^ N is a maximal subset where P A — Q A , then 

U ihj})nA c = A c , (10) 

Jy']eM*uAf* / 

U {m}) nA ^ 0. (li) 

Jij]eM*UM* / 

Proof: The former condition seems evident: every j G A c must be in the 
(unique) 2-cardinal block of at least one of the atoms [ij] G MpUMg; otherwise, 
there would be some proper superset A' D A, namely the union of A and all 
j G A c left out by both collections of atoms, were P A — Q A . 
Now assume the latter condition is not satisfied: V \if\ ^ \A C } U P, . 

[ij]eM*UM* L 

Define P' = V and Q' = V \ij] and consider any A <Z A c such 

that \A n B| = 1 for every P G (P' V Q')^- Then, |i| = |(P' V Q')^! > 1, 
and p- 41 ^ = Q j4uA , again violating the assumption that A is a maximal subset 
where P A — Q A . ■ 

As every j € A c has to be in the 2-cardinal block of at least one atom 
[ij] G Mp U Mq, minimization surely attains, as long as possible, when every 
j E A c is in the 2-cardinal block of precisely one atom in the union of the two 
maximal collections, in which case S IH (P, Q) = \M P \ + \Mq\ = D(P, Q). 

Claim 10 For k < 2(n — k), the lower bound is min S IH (P 7 Q) = k. 

D{P,Q)=k 

Proof: Fix k < 2{n-k) and let A G 2^, \A\ = n-fcbea maximal subset where 
the two partitions P, Q to be constructed coincide. Of course, if k = 0, then 
P = Q and there is nothing to show. Otherwise, for k > 0, choose P A = Pf. 
Then, inequality k < 2(n — k) entails that P and Q (satisfying (10), (11) and 
Mp n Mq = above) may be constructed in a way such that they each admit 
a unique representation as a join of atoms (which is thus both the maximal 
and minimal one). As already observed, this achieves when every j G A c is in 
only one atom in the union of the two maximal collections, entailing, in turn, 
that: (1) every i G A is in no more than two atoms in that union, and (2) 
such two (at most) atoms are one finer than P and incomparable with Q, and 
the other incomparable with P and finer than Q. For the sake of concreteness, 
case k = 2(n — k) is easily detailed by setting A = . . . ,i n -k} as well as 
A c = {ji, ■ ■ ■ , j 2 (n-fc)}- Then, 

p = [nil] V [izh] V [i 3 .?5] V • • • V [i m j 2 m-l] V • • • V [i n -kj2(n-k)-l], 

Q = [iij'2] V [i 2 ji] V [i 3 j 6 ] V • • • V [w' 2 m] V • • • V [i n -kh{n-k)\, 

and s p = n - k = s Q as well as s PAQ = 0, hence S IH (P, Q) = 2(n - k) = k. 
In general, if the inequality is strict, k < 2(n — k), then not all n — k elements 
i G A appear in two atoms in the union of the two maximal collections, in that 
some may be in only one atom, while some others may even be in no atom at 
all. What matters is that all needed conditions get satisfied by making every 
i G A appearing in maximally two such atoms, each finer than one partition 



16 



but incomparable with the other, while if j G A c appears precisely in one atom 
[ij] G Mp U Mq, then |yl c | = \M* P \ + \M* Q \ = 8 IH (P, Q) = k = D(P, Q). m 

For example, let N = {1,2,3,4,5,6} and P = [12] V [34] = {12|34|5|6} as 
well as Q = [56] = {1|2|3|4|56}. Then, a maximal A C N with P A = Q A 
is 3-cardinal, say A = {2,4,6}, thus D(P,Q) =6-3 = 3. Also, s PAC ? = 
and s p = 2 = 2s® yield 5 IH (P,Q) = 2 + 1 = 3. The same obtains for P = 
P±,Q = {12|34|56}. Conversely, P' = {12|34|56}, Q' = {16|23|45} again yield 
P'A Q' = Pj_ and a maximal A C N with P' A = Q' A such as A = {2, 4, 6}), but 
now 5 IH (P',Q') = 6 = 2D(P',Q'). On the other hand, P" = {12|34|5|6} and 
Q" = {12|35|4|6} yield P" A Q" = [12] and there is a unique maximal subset 
A C N where P' M = Q" A ; it is A = {1,2,4,5,6}, entailing 5 IH {P",Q") = 2 = 
2D(P",Q"). 

If k > 2(n—k), then of course the above construction does not yield the same 
result, but still indicates how to obtain the sought minimum: basically, either 
P or Q or both constructed in that manner display some block with cardinality 
> 3. In particular, the construction remains valid for determining two minimal 
collections of atoms whose join yields the two partitions P, Q where IH distance 
is minimized. In the union of these two minimal collections, every j G A c (with 
A G 2 N being a maximal subset where P A = Q A ) still compares in precisely 
one atom, but when turning to maximal collections this is no longer achievable. 



Claim 11 For 2(n — k) < k < n — 1, the lower bound is min S 1H (P, Q) 

P,Q£V N 
D(P,Q)=k 



k — (n — k) 
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Proof: Again choose P A — P A consisting of 



k singletons or 1- 



cardinal blocks. Then, covering A c with atoms [ij] or pairs {i,j} such that 
i G A,j G A c as indicated above entails that some elements i £ i have to 
be atom- linked with more than two distinct elements G A c , while every 
j G A c still appears in precisely one atom. Making this as uniform as possible, 

atoms in the union of the 



every i G A appears in either — z^J or else 



two collections. Then, the best every i G A can do for minimizing distance 



5 (P,Q) while being atom- linked with either 



or else 
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distinct 



j G A c , is splitting these 
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possible, between P and Q. That is, 
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but incomparable with Q, while the remaining ;^:J /2 
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-k 

is the number 



formation of a block, either in P or in Q, whose cardinality is m + 1, precisely 
because m is a number of elements j G A c to which a common i G A is joined 

through atoms [ij]. Finally, the number of elements i € A appearing in 

atoms is fc — (n — fc) , while (n — fc) — k — {n — k) -^-^ j 

of elements i £ i appearing in — atoms. ■ 

For example, let N — {1,2,3,4,5,6,7} and fix A = {1,2} as a maximal 
subset such that P A = Q A = {1|2}. Then, 2(n - k) = 2(7 - 5) = 4 < 5 = k and 
1 e A has to be atom-linked with = 3 elements j G A c while 2 £ A has 

to be atomdinked with L"^J = 2 elements j G A c = {3, 4, 5, 6, 7} (or vice versa 
switching 1 and 2). On the other hand, 1 e i divides these three atoms into 
two determining (through join) partition P and the remaining one determining 
partition Q. Similarly, 2 G A being involved in a even number of atoms, these 
latter can be divided equally between P and Q. This means 

P = [13] V [14] V [26] = [13] V [14] V [34] V [26] = {134|26|5|7}, 
Q = [15] V [27] = {15|27|3|4|6}. 

Hence S IH (P, Q) = s p + - 2s PAC * = 4 + 2- = 6>5 = 7- 2 = D(P, Q). 
Conversely, if A = {1,2,3} is a maximal subset with P A = Q A = {1|2|3}, then 
2(n — fc)=2(7 — 4) = 6>4 = fc and thus the situation is that of claim 11. 
Accordingly, partitions P, Q may be (for example) as follows: 

P = [14] V [26] = {14|26|3|5|7}, 
Q = [15] V [37] = {15|2|37|4|6}, 

yielding 6 IH (P, Q) = 4 = D(P, Q). 

Finally, case k = n — 1 is simple: conditions (10), (11) and Mp n Mq = 
entail that in one of the two partitions, say P, all n — 1 elements j G A c are 
atomdinked with the unique element {i} = A, entailing P — P T , while the 
other partition has to be Q = P± . On the other hand, S IH (P, Q) — (2) if and 
only if P = P T , Q = Pj_. Therefore, 

D(P,Q) = n- 1 ^ P = P T ,Q = P ± 5 IH {P,Q) = Q ■ 

For the upper bound all the above conditions (21), (22) and Mp n Mq = 
remain valid, but M p UMq must be as large as possible. To achieve this, rather 
than distributing the needed atoms [ij] G Mp U Mg,i G A,j G A c in the most 
uniform way over the n — k elements i G A as for the lower bound, it is now 
necessary to concentrate them as much as possible, which is easy. 

Claim 12 max 5 IH (P, Q) = (") - ( Il r fe ) for all < jfe < n - 1. 

D(P,Q)=k 

Proof: Let A G 2 N , \A\ = n — k be a maximal subset such that P A = Q A . 
Again, if k = 0, then P = Q and there is nothing to show. Otherwise, for 
< k < n - 1, choose P = P T ,Q = {A} U Pf, entailing P A = {A}. Then, 
8 IH {P, Q) = s p + s Q - 2sQ = s p - S Q = (™) - ( n - k ). It seems rather evident 
that there is no way of satisfying (10), (11) and mostly Mp n Mq = while 
involving a larger number of atoms. ■ 



18 



Note that this upper bound attains on pairs of comparable modular elements, 

and on such pairs the IH and SB distances coincide. In fact, at each level 

Vj? , < k < n of the partition lattice the size function attains its maximum 

precisely on modular elements: max s p = ( fe t 1 ) . 

PeV N V 2 ) 



6.1 Constrained bounds 

In these lower and upper bounds considered above the cardinality n — k of a 
maximal subset A<e2 n where the two generic partitions P, Q coincide is fixed, 
while the form of P A = Q A is chosen arbitrarily. In fact, for the lower bound 
the choice is P A = P A and for the upper one it is P A = {A}. Accordingly, the 
constrained version of these bounding problems also fixes P A , through its class 
c pA — (c pA , • • • , c„~kj ■ Considering such a version may be useful for further 

seeing in detail how many distinct values are actually taken by 5 IH (-, •) for each 
value k, < k < n of partition-distance D(-, •). 

By claim 12 above, determining the constrained upper bound is simple: 

D{P,Q) = k B^B> 
P A FIXED 

with k — \A C \. That is, partition P chooses a largest block B e P A and obtains 

asP= (P A U Pf) V [i 7*1 , while Q chooses a largest block B' G P A \B 

[ij]ev» ' 

and obtains as Q = (P A U P A ) V Accordingly, the distance is the 

sum of s p - s p ^ = (l B l +fe ) - (If I) and S Q - s p ^ = i} B '\ +k ) - (Ifl). 

Like in the free version of the problem, determining the constrained lower 
bound is less simple, in general. Still, an immediate adaptation of claim 10 to 

this more general situation is: if k < 2cf , then min S IH (P, Q) = k. 

p,Qgv n 

D{P,Q) = k=\A" 
P A FIXED 

The lower bound for all other cases where k > 2c pA clearly cannot be ap- 
proached by considering separatley all possible classes c p of partitions of a 
n — k-set, with n arbitrarily large and < k < n. Conversely, what seems 
interesting is an algorithmic view of the problem. In particular, the sought 
lower bound may be determined through a greedy construction of a bipartite 
graph G = (V x V',E) where V = P A , V = A c , E C V x V . In words, 
vertex subset V contains all blocks of P A , vertex subset V contains all ele- 
ments j G A c and any edge (B,j) G E links a block B G P A and an element 
j G A c . To see how this relates to the constrained bounding problem, firstly 
let the graph G$ = (V x V ,%) with empty edge set correspond to the ini- 
tial situation where Pq = P A U P A = Qo, with P > Pq,Q > Qo denoting 
the two partitions to be constructed by adding edges and such that, eventu- 
ally, S IH (P,Q) is the sought lower bound. Now consider adding edges one 

after the other, while conceiving edge set E = E p U E® as partitioned into 
two blocks E P ,E® corresponding to partitions P, Q. This yields a sequence 

Go - G ,d = (Vx V',E p UE?),...,G m = (V x V',E P U . . . of bi- 
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partite graphs. In particular, let the sequence of edges progressively added one 
after the other be such that all links added at an odd step m = 1, 3, 5, . . . are in 
E p , while all links added at an even step m = 2, 4, 6, . . . are in . Finally, the 
main rule for the construction is the following: at the end, every vertex j G A c 
has to be the end- vertex of precisely one added edge. Evidently, this means that 
the above sequence terminates exactly at the fc-th step, where k — |A C |, and 
allows for some blocks B E P A to remain isolated vertexes in the final graph. 

At any intermediate step to, < to < k, graph G m = (V x V',E^ U E®) 
identifies the two (not yet final) partitions PrmQm as follows: 

P m = Po V [ij] as well as Q m = Q Q V [i'j]. 

(B,j)eE^ l (B',j)eEg 
i£B j/€B , 

Define eg m , e% m : V -> Z+ by 

e p Gm {B) = \{(B,j) : (B,j) e <}| and e% m {B) = \{(B,j) : (B,j) e 

for every B £ V. Then, S IH (P m , Q m ) = s p - + - 2s p " = 

e p Gm {B)^ + (\B\+e Q G jB) 
, 2 

Now, for any sequence Go, Gi, ■ • • , G m , < to < of bipartite graphs as 
above, define a weight function w m +i(-) (over edges) as follows: if to = or m 
is even, then w m+ i : (V x V')\E P — > N assigns an integer weight to every edge 

(B,j)£££by 

w m+1 ((B,j)) = 1 

while if to is odd, then w m+ \ : (V x V')\E® — > N assigns an integer weight to 
every edge (B, j) £ by 





w m+1 ((B,j)) = 



B\ + l + e^(B) 



This enables to construct a k + 1-sequence of graphs Go, G\ , . . . G* l7 . . . , G£ in 
a greedy fashion, that is, by adding at each step to, < to < k an edge with 
minimum weight as given by weight function w m (-). Then, the sought lower 
bound is 

+ ( |jB| + 2 gs(jB> ) - 2 

and every vertex B e V which remains isolated in the final graph G% has 
e G ,(B) = = eQ,(B), entailing that the corresponding term in the summation 
simply vanishes. 





7 Concluding remarks 

Quantifying differences between partitions is needed in statistics, where parti- 
tions are clusterings and blocks are clusters. On the other hand, the partition 
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lattice is very important in lattice theory, where it appears to be the main ex- 
ample of an indecomposable geometric lattice. While the Hamming distance 
between subsets may be extended to any distributive lattice, how to measure 
differences between elements of geometric lattices seems disregarded in combi- 
natorial theory. This paper addresses the issue from alternative perspectives, 
and in general shows that any monotone lattice function, such as the rank, may 
be used for constructing a distance measure. 

Among the measures considered, the IH distance clearly is the analog of 
the Hamming distance between subsets. It obtains by focusing on atoms and 
through the size function. In particular, the size and the rank of a partition are, 
respectively, the maximum and the minimum number of atoms whose join yields 
that partition; their difference maximally is s p — r(P T ) = ("j 1 )- Conversely, 
for every subset there is a unique number of atoms whose join (union) yields 
that subset, and this number is the rank of the subset. 

Given that a variety of distance measures is considered, comparing them 
seems natural as well as useful, and to this end any distance measure <5(-, •) may 
be [0, l]-normalized as £)(•, •) = 1 — [1/(1 + 8(-, •))]• hi this view, the larger the 

range (or image) U 5(P, Q) = R(5) C [0, 1] of a normalized distance, 

(P,Q)eV N xP N 

the more precise and granular this latter is. Pushing the comparison into a 
ranking, the less attractive (normalized) distances appearing above are those 
satisfying modularity and co-maximality, hence the rank-based S RB distance 
and the size-based S SB one. Apart from their range, these two measures are not 
able to appreciate that non-modular partitions have many complements, some 
of which (strictly) coarser than others [13], and this is a main flaw. Next come 
the modified RB distance 8+ B and partition-distance D; their range is rather 
small if the aim is at distinguishing between all possible differences between 
partitions. The SD distance S SD has a larger range, but still smaller than IH 
distance 8 IH . 

From a final perspective, determining D(P, Q) for generic partitions P, Q is 
a computational problem whose solution requires polynomial time [5J theorem 
2.1, p. 160]. On the other hand, if prepared to use binary (™)-arrays as data 
structures, then IH distance is 8 IH {P,Q) = (Ip,Ig), where (•,•} denotes scalar 
product while Ip : — > {0,1} is the indicator function (or binary (™)-array 
representation) of partitions P G V N introduced above. 
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